Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WISH: Check with _R_CHECK_SUGGESTS_ONLY_=true to identify missing package dependencies #248

Closed
HenrikBengtsson opened this issue Dec 1, 2022 · 27 comments · Fixed by #249

Comments

@HenrikBengtsson
Copy link

(This replaces invalid feature requests #242, #243, and #244.)

Background

Using _R_CHECK_SUGGESTS_ONLY_=true will run R CMD check with sandboxed an R library, where on declared package dependencies in DESCRIPTION + recursive hard dependencies are included. If an example, package test, or a vignette needs a package that is not in this set of dependencies, it will not be available to be loaded and result in an error.

Wish

Add _R_CHECK_SUGGESTS_ONLY_=true to Renviron.bioc to detect missing package dependencies.

R CMD check sets up a sandboxed R library for it's life span. It uses symbolic file links to do this, so the performance overhead for doing this is really small. I could not detect a performance penalty (see below for details).

Examples

I ran the following examples using Renviron.bioc for Bioconductor 3.17:

wget https://raw.githubusercontent.com/Bioconductor/BBS/master/3.17/Renviron.bioc

I also ran with:

export _R_CHECK_TESTS_NLINES_=17

to make it clear what one of the ERRORs is about.

Example 1: Package bnem (missing RUnit for tests and BiocStyle for vignettes)

Downloading bnem package source tarball:

$ wget https://bioconductor.org/packages/devel/bioc/src/contrib/bnem_1.7.0.tar.gz

Current Bioconductor 3.17 checks

When checking with:

$ R_CHECK_ENVIRON=Renviron.bioc R --vanilla CMD check bnem_1.7.0.tar.gz

package passes with a single NOTE.

* using log directory ‘/tmp/bioc/bnem.Rcheck’
* using R Under development (unstable) (2022-11-30 r83391)
* using platform: x86_64-pc-linux-gnu (64-bit)
* using session charset: UTF-8
* checking for file ‘bnem/DESCRIPTION’ ... OK
* checking extension type ... Package
* this is package ‘bnem’ version ‘1.7.0’
* package encoding: UTF-8
* checking package namespace information ... OK
* checking package dependencies ... OK
* checking if this is a source package ... OK
* checking if there is a namespace ... OK
* checking for hidden files and directories ... OK
* checking for portable file names ... OK
* checking for sufficient/correct file permissions ... OK
* checking whether package ‘bnem’ can be installed ... OK
* checking installed package size ... OK
* checking package directory ... OK
* checking ‘build’ directory ... OK
* checking DESCRIPTION meta-information ... OK
* checking top-level files ... OK
* checking for left-over files ... OK
* checking index information ... OK
* checking package subdirectories ... OK
* checking R files for non-ASCII characters ... OK
* checking R files for syntax errors ... OK
* checking whether the package can be loaded ... OK
* checking whether the package can be loaded with stated dependencies ... OK
* checking whether the package can be unloaded cleanly ... OK
* checking whether the namespace can be loaded with stated dependencies ... OK
* checking whether the namespace can be unloaded cleanly ... OK
* checking loading without being on the library search path ... OK
* checking dependencies in R code ... NOTE
Namespace in Imports field not imported from: ‘rmarkdown’
  All declared Imports should be used.
* checking S3 generic/method consistency ... OK
* checking replacement functions ... OK
* checking foreign function calls ... OK
* checking R code for possible problems ... OK
* checking Rd files ... OK
* checking Rd metadata ... OK
* checking Rd cross-references ... OK
* checking for missing documentation entries ... OK
* checking for code/documentation mismatches ... OK
* checking Rd \usage sections ... OK
* checking Rd contents ... OK
* checking for unstated dependencies in examples ... OK
* checking contents of ‘data’ directory ... OK
* checking data for non-ASCII characters ... OK
* checking data for ASCII and uncompressed saves ... OK
* checking installed files from ‘inst/doc’ ... OK
* checking files in ‘vignettes’ ... OK
* checking examples ... OK
* checking for unstated dependencies in ‘tests’ ... OK
* checking tests ...
  Running ‘runTests.R’
 OK
* checking for unstated dependencies in vignettes ... OK
* checking package vignettes in ‘inst/doc’ ... OK
* checking running R code from vignettes ...
  ‘bnem.rmd’ using ‘UTF-8’... OK
 NONE
* checking re-building of vignette outputs ... OK
* checking PDF version of manual ... OK
* DONE

Status: 1 NOTE
See
‘/tmp/bioc/bnem.Rcheck/00check.log’
for details.

With also _R_CHECK_SUGGESTS_ONLY_=true

When checking with:

$ _R_CHECK_SUGGESTS_ONLY_=true R_CHECK_ENVIRON=Renviron.bioc R --vanilla CMD check bnem_1.7.0.tar.gz

the package fails with two ERRORs:

* checking tests ...
  Running ‘runTests.R’
 ERROR
Running the tests in ‘tests/runTests.R’ failed.
Complete output:
  > BiocGenerics:::testPackage("bnem")
  Error in library("RUnit", quietly = TRUE) : 
    there is no package called 'RUnit'

and

* checking re-building of vignette outputs ... ERROR
Error(s) in re-building vignettes:
  ...
--- re-building ‘bnem.rmd’ using rmarkdown
Error: processing vignette 'bnem.rmd' failed with diagnostics:
there is no package called ‘BiocStyle’
* using log directory ‘/tmp/bioc2/bnem.Rcheck’
* using R Under development (unstable) (2022-11-30 r83391)
* using platform: x86_64-pc-linux-gnu (64-bit)
* using session charset: UTF-8
* checking for file ‘bnem/DESCRIPTION’ ... OK
* checking extension type ... Package
* this is package ‘bnem’ version ‘1.7.0’
* package encoding: UTF-8
* checking package namespace information ... OK
* checking package dependencies ... OK
* checking if this is a source package ... OK
* checking if there is a namespace ... OK
* checking for executable files ... OK
* checking for hidden files and directories ... OK
* checking for portable file names ... OK
* checking for sufficient/correct file permissions ... OK
* checking whether package ‘bnem’ can be installed ... OK
* checking installed package size ... OK
* checking package directory ... OK
* checking ‘build’ directory ... OK
* checking DESCRIPTION meta-information ... OK
* checking top-level files ... OK
* checking for left-over files ... OK
* checking index information ... OK
* checking package subdirectories ... OK
* checking R files for non-ASCII characters ... OK
* checking R files for syntax errors ... OK
* checking whether the package can be loaded ... OK
* checking whether the package can be loaded with stated dependencies ... OK
* checking whether the package can be unloaded cleanly ... OK
* checking whether the namespace can be loaded with stated dependencies ... OK
* checking whether the namespace can be unloaded cleanly ... OK
* checking loading without being on the library search path ... OK
* checking dependencies in R code ... NOTE
Namespace in Imports field not imported from: ‘rmarkdown’
  All declared Imports should be used.
* checking S3 generic/method consistency ... OK
* checking replacement functions ... OK
* checking foreign function calls ... OK
* checking R code for possible problems ... OK
* checking Rd files ... OK
* checking Rd metadata ... OK
* checking Rd cross-references ... OK
* checking for missing documentation entries ... OK
* checking for code/documentation mismatches ... OK
* checking Rd \usage sections ... OK
* checking Rd contents ... OK
* checking for unstated dependencies in examples ... OK
* checking contents of ‘data’ directory ... OK
* checking data for non-ASCII characters ... OK
* checking data for ASCII and uncompressed saves ... OK
* checking installed files from ‘inst/doc’ ... OK
* checking files in ‘vignettes’ ... OK
* checking examples ... OK
* checking for unstated dependencies in ‘tests’ ... OK
* checking tests ...
  Running ‘runTests.R’
 ERROR
Running the tests in ‘tests/runTests.R’ failed.
Complete output:
  > BiocGenerics:::testPackage("bnem")
  Error in library("RUnit", quietly = TRUE) : 
    there is no package called 'RUnit'
  Calls:  -> library
  Execution halted
* checking for unstated dependencies in vignettes ... OK
* checking package vignettes in ‘inst/doc’ ... OK
* checking running R code from vignettes ...
  ‘bnem.rmd’ using ‘UTF-8’... OK
 NONE
* checking re-building of vignette outputs ... ERROR
Error(s) in re-building vignettes:
  ...
--- re-building ‘bnem.rmd’ using rmarkdown
Error: processing vignette 'bnem.rmd' failed with diagnostics:
there is no package called ‘BiocStyle’
--- failed re-building ‘bnem.rmd’

SUMMARY: processing the following file failed:
‘bnem.rmd’

Error: Vignette re-building failed.
Execution halted

  • checking PDF version of manual ... OK
  • DONE

Status: 2 ERRORs, 1 NOTE
See
‘/tmp/bioc2/bnem.Rcheck/00check.log’
for details.

The problem is that bnem is missing:

Suggests: RUnit, BiocStyle

Example 2: Package tradeSeq (missing DelayedMatrixStats for examples and tests)

Downloading tradeSeq package source tarball:

$ wget https://bioconductor.org/packages/devel/bioc/src/contrib/tradeSeq_1.13.0.tar.gz

Current Bioconductor 3.17 checks

When checking with:

$ R_CHECK_ENVIRON=Renviron.bioc R --vanilla CMD check tradeSeq_1.13.0.tar.gz

package passes with three NOTEs.

$ R_CHECK_ENVIRON=Renviron.bioc R --vanilla CMD check tradeSeq_1.13.0.tar.gz
* using log directory ‘/tmp/bioc/tradeSeq.Rcheck’
* using R Under development (unstable) (2022-11-30 r83391)
* using platform: x86_64-pc-linux-gnu (64-bit)
* using session charset: UTF-8
* checking for file ‘tradeSeq/DESCRIPTION’ ... OK
* checking extension type ... Package
* this is package ‘tradeSeq’ version ‘1.13.0’
* package encoding: UTF-8
* checking package namespace information ... OK
* checking package dependencies ... OK
* checking if this is a source package ... OK
* checking if there is a namespace ... OK
* checking for hidden files and directories ... OK
* checking for portable file names ... OK
* checking for sufficient/correct file permissions ... OK
* checking whether package ‘tradeSeq’ can be installed ... OK
* checking installed package size ... NOTE
  installed size is  7.5Mb
  sub-directories of 1Mb or more:
    doc   6.5Mb
* checking package directory ... OK
* checking ‘build’ directory ... OK
* checking DESCRIPTION meta-information ... OK
* checking top-level files ... OK
* checking for left-over files ... OK
* checking index information ... OK
* checking package subdirectories ... OK
* checking R files for non-ASCII characters ... OK
* checking R files for syntax errors ... OK
* checking whether the package can be loaded ... OK
* checking whether the package can be loaded with stated dependencies ... OK
* checking whether the package can be unloaded cleanly ... OK
* checking whether the namespace can be loaded with stated dependencies ... OK
* checking whether the namespace can be unloaded cleanly ... OK
* checking loading without being on the library search path ... OK
* checking dependencies in R code ... NOTE
Namespaces in Imports field not imported from:
  ‘Biobase’ ‘igraph’
  All declared Imports should be used.
* checking S3 generic/method consistency ... OK
* checking replacement functions ... OK
* checking foreign function calls ... OK
* checking R code for possible problems ... NOTE
.earlyDETest: no visible binding for global variable ‘X1’
.earlyDETest: no visible binding for global variable ‘X2’
.findKnots: no visible binding for global variable ‘t1’
.findKnots: no visible binding for global variable ‘l1’
.plotGeneCount: no visible binding for global variable ‘dim1’
.plotGeneCount: no visible binding for global variable ‘dim2’
.plotSmoothers: no visible binding for global variable ‘time’
.plotSmoothers: no visible binding for global variable ‘gene_count’
.plotSmoothers: no visible binding for global variable ‘lineage’
.plotSmoothers_conditions: no visible binding for global variable
  ‘time’
.plotSmoothers_conditions: no visible binding for global variable
  ‘gene_count’
.plotSmoothers_conditions: no visible binding for global variable
  ‘lineage’
.plotSmoothers_conditions: no visible binding for global variable
  ‘pCol’
.plotSmoothers_sce: no visible binding for global variable ‘time’
.plotSmoothers_sce: no visible binding for global variable ‘gene_count’
.plotSmoothers_sce: no visible binding for global variable ‘lineage’
.plotSmoothers_sce: no visible binding for global variable ‘pCol’
Undefined global functions or variables:
  X1 X2 dim1 dim2 gene_count l1 lineage pCol t1 time
Consider adding
  importFrom("stats", "time")
to your NAMESPACE file.
* checking Rd files ... OK
* checking Rd metadata ... OK
* checking Rd cross-references ... OK
* checking for missing documentation entries ... OK
* checking for code/documentation mismatches ... OK
* checking Rd \usage sections ... OK
* checking Rd contents ... OK
* checking for unstated dependencies in examples ... OK
* checking contents of ‘data’ directory ... OK
* checking data for non-ASCII characters ... OK
* checking data for ASCII and uncompressed saves ... OK
* checking installed files from ‘inst/doc’ ... OK
* checking files in ‘vignettes’ ... OK
* checking examples ... OK
* checking for unstated dependencies in ‘tests’ ... OK
* checking tests ...
  Running ‘testthat.R’
 OK
* checking for unstated dependencies in vignettes ... OK
* checking package vignettes in ‘inst/doc’ ... OK
* checking running R code from vignettes ...
  ‘Monocle.Rmd’ using ‘UTF-8’... OK
  ‘fitGAM.Rmd’ using ‘UTF-8’... OK
  ‘multipleConditions.Rmd’ using ‘UTF-8’... OK
  ‘tradeSeq.Rmd’ using ‘UTF-8’... OK
 NONE
* checking re-building of vignette outputs ... OK
* checking PDF version of manual ... OK
* DONE

Status: 3 NOTEs
See
‘/tmp/bioc/tradeSeq.Rcheck/00check.log’
for details.

With also _R_CHECK_SUGGESTS_ONLY_=true

When checking with:

$ _R_CHECK_SUGGESTS_ONLY_=true R_CHECK_ENVIRON=Renviron.bioc R --vanilla CMD check tradeSeq_1.13.0.tar.gz

the package fails with two ERRORs:

* checking examples ... ERROR
Running examples in ‘tradeSeq-Ex.R’ failed
...
> lin <- getLineages(rd, clusterLabels = cl, start.clus = 4)
Error in loadNamespace(x) : 
  there is no package called ‘DelayedMatrixStats’
Calls: getLineages ... loadNamespace -> withRestarts -> withOneRestart -> doWithOneRestart
Execution halted

and

* checking tests ...
  Running ‘testthat.R’ 
 ERROR
Running the tests in ‘tests/testthat.R’ failed.
Last 17 lines of output:
══ Failed tests ════════════════════════════════════════════════════════════════
── Error ('testConditions.R:23'): (code run outside of `test_that()`) ──────────
<packageNotFoundError/error/condition>
Error in `loadNamespace(x)`: there is no package called 'DelayedMatrixStats'
Backtrace:
    ▆
 1. ├─slingshot::as.PseudotimeOrdering(sds) at testConditions.R:23:0
 2. ├─slingshot::as.PseudotimeOrdering(sds)
 3. │ └─slingshot (local) .local(x, ...)
 4. │   └─TrajectoryUtils::rowmean(reducedDim(sds), slingClusterLabels(sds))
 5. │     └─TrajectoryUtils:::.rowstats_w(...)
 6. └─base::loadNamespace(x)
 7.   └─base::withRestarts(stop(cond), retry_loadNamespace = function() NULL)
 8.     └─base (local) withOneRestart(expr, restarts[[1L]])
 9.       └─base (local) doWithOneRestart(return(expr), restart)

[ FAIL 1 | WARN 1 | SKIP 0 | PASS 136 ]
* using log directory ‘/tmp/bioc2/tradeSeq.Rcheck’
* using R Under development (unstable) (2022-11-30 r83391)
* using platform: x86_64-pc-linux-gnu (64-bit)
* using session charset: UTF-8
* checking for file ‘tradeSeq/DESCRIPTION’ ... OK
* checking extension type ... Package
* this is package ‘tradeSeq’ version ‘1.13.0’
* package encoding: UTF-8
* checking package namespace information ... OK
* checking package dependencies ... OK
* checking if this is a source package ... OK
* checking if there is a namespace ... OK
* checking for executable files ... OK
* checking for hidden files and directories ... OK
* checking for portable file names ... OK
* checking for sufficient/correct file permissions ... OK
* checking whether package ‘tradeSeq’ can be installed ... OK
* checking installed package size ... NOTE
  installed size is  7.5Mb
  sub-directories of 1Mb or more:
    doc   6.5Mb
* checking package directory ... OK
* checking ‘build’ directory ... OK
* checking DESCRIPTION meta-information ... OK
* checking top-level files ... OK
* checking for left-over files ... OK
* checking index information ... OK
* checking package subdirectories ... OK
* checking R files for non-ASCII characters ... OK
* checking R files for syntax errors ... OK
* checking whether the package can be loaded ... OK
* checking whether the package can be loaded with stated dependencies ... OK
* checking whether the package can be unloaded cleanly ... OK
* checking whether the namespace can be loaded with stated dependencies ... OK
* checking whether the namespace can be unloaded cleanly ... OK
* checking loading without being on the library search path ... OK
* checking dependencies in R code ... NOTE
Namespaces in Imports field not imported from:
  ‘Biobase’ ‘igraph’
  All declared Imports should be used.
* checking S3 generic/method consistency ... OK
* checking replacement functions ... OK
* checking foreign function calls ... OK
* checking R code for possible problems ... NOTE
.earlyDETest: no visible binding for global variable ‘X1’
.earlyDETest: no visible binding for global variable ‘X2’
.findKnots: no visible binding for global variable ‘t1’
.findKnots: no visible binding for global variable ‘l1’
.plotGeneCount: no visible binding for global variable ‘dim1’
.plotGeneCount: no visible binding for global variable ‘dim2’
.plotSmoothers: no visible binding for global variable ‘time’
.plotSmoothers: no visible binding for global variable ‘gene_count’
.plotSmoothers: no visible binding for global variable ‘lineage’
.plotSmoothers_conditions: no visible binding for global variable
  ‘time’
.plotSmoothers_conditions: no visible binding for global variable
  ‘gene_count’
.plotSmoothers_conditions: no visible binding for global variable
  ‘lineage’
.plotSmoothers_conditions: no visible binding for global variable
  ‘pCol’
.plotSmoothers_sce: no visible binding for global variable ‘time’
.plotSmoothers_sce: no visible binding for global variable ‘gene_count’
.plotSmoothers_sce: no visible binding for global variable ‘lineage’
.plotSmoothers_sce: no visible binding for global variable ‘pCol’
Undefined global functions or variables:
  X1 X2 dim1 dim2 gene_count l1 lineage pCol t1 time
Consider adding
  importFrom("stats", "time")
to your NAMESPACE file.
* checking Rd files ... OK
* checking Rd metadata ... OK
* checking Rd cross-references ... OK
* checking for missing documentation entries ... OK
* checking for code/documentation mismatches ... OK
* checking Rd \usage sections ... OK
* checking Rd contents ... OK
* checking for unstated dependencies in examples ... OK
* checking contents of ‘data’ directory ... OK
* checking data for non-ASCII characters ... OK
* checking data for ASCII and uncompressed saves ... OK
* checking installed files from ‘inst/doc’ ... OK
* checking files in ‘vignettes’ ... OK
* checking examples ... ERROR
Running examples in ‘tradeSeq-Ex.R’ failed
The error most likely occurred in:

Name: plotGeneCount

Title: Plot gene expression in reduced dimension.

Aliases: plotGeneCount plotGeneCount,SlingshotDataSet-method

plotGeneCount,PseudotimeOrdering-method

plotGeneCount,SingleCellExperiment-method

** Examples

set.seed(97)
library(slingshot)
Loading required package: princurve
Loading required package: TrajectoryUtils
Loading required package: SingleCellExperiment
Loading required package: SummarizedExperiment
Loading required package: MatrixGenerics
Loading required package: matrixStats

Attaching package: ‘MatrixGenerics’

The following objects are masked from ‘package:matrixStats’:

colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
colWeightedMeans, colWeightedMedians, colWeightedSds,
colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
rowWeightedSds, rowWeightedVars

Loading required package: GenomicRanges
Loading required package: stats4
Loading required package: BiocGenerics

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:stats’:

IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
as.data.frame, basename, cbind, colnames, dirname, do.call,
duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table,
tapply, union, unique, unsplit, which.max, which.min

Loading required package: S4Vectors

Attaching package: ‘S4Vectors’

The following objects are masked from ‘package:base’:

I, expand.grid, unname

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: Biobase
Welcome to Bioconductor

Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.

Attaching package: ‘Biobase’

The following object is masked from ‘package:MatrixGenerics’:

rowMedians

The following objects are masked from ‘package:matrixStats’:

anyMissing, rowMedians

data(crv, package="tradeSeq")
data(countMatrix, package="tradeSeq")
rd <- slingReducedDim(crv)
cl <- kmeans(rd, centers = 7)$cluster
lin <- getLineages(rd, clusterLabels = cl, start.clus = 4)
Error in loadNamespace(x) :
there is no package called ‘DelayedMatrixStats’
Calls: getLineages ... loadNamespace -> withRestarts -> withOneRestart -> doWithOneRestart
Execution halted

  • checking for unstated dependencies in ‘tests’ ... OK

  • checking tests ...
    Running ‘testthat.R’
    ERROR
    Running the tests in ‘tests/testthat.R’ failed.
    Last 17 lines of output:
    ══ Failed tests ════════════════════════════════════════════════════════════════
    ── Error ('testConditions.R:23'): (code run outside of test_that()) ──────────
    <packageNotFoundError/error/condition>
    Error in loadNamespace(x): there is no package called 'DelayedMatrixStats'
    Backtrace:

    1. ├─slingshot::as.PseudotimeOrdering(sds) at testConditions.R:23:0
    2. ├─slingshot::as.PseudotimeOrdering(sds)
    3. │ └─slingshot (local) .local(x, ...)
    4. │ └─TrajectoryUtils::rowmean(reducedDim(sds), slingClusterLabels(sds))
    5. │ └─TrajectoryUtils:::.rowstats_w(...)
    6. └─base::loadNamespace(x)
    7. └─base::withRestarts(stop(cond), retry_loadNamespace = function() NULL)
    8. └─base (local) withOneRestart(expr, restarts[[1L]])
      
    9.   └─base (local) doWithOneRestart(return(expr), restart)
      

    [ FAIL 1 | WARN 1 | SKIP 0 | PASS 136 ]
    Error: Test failures
    Execution halted

  • checking for unstated dependencies in vignettes ... OK

  • checking package vignettes in ‘inst/doc’ ... OK

  • checking running R code from vignettes ...
    ‘Monocle.Rmd’ using ‘UTF-8’... OK
    ‘fitGAM.Rmd’ using ‘UTF-8’... OK
    ‘multipleConditions.Rmd’ using ‘UTF-8’... OK
    ‘tradeSeq.Rmd’ using ‘UTF-8’... OK
    NONE

  • checking re-building of vignette outputs ... OK

  • checking PDF version of manual ... OK

  • DONE

Status: 2 ERRORs, 3 NOTEs
See
‘/tmp/bioc2/tradeSeq.Rcheck/00check.log’
for details.

The problem is that tradeSeq is missing:

Suggests: DelayedMatrixStats

Performance overhead

Running R CMD check with and without _R_CHECK_SUGGESTS_ONLY_=true on progressr, which is an all OK package with lots of Suggests:ed dependencies that are also "heavy", confirms that there should be little, near-zero, overhead added.

With:

$ _R_CHECK_SUGGESTS_ONLY_=true R_CHECK_ENVIRON=Renviron.bioc command time R --vanilla CMD check progressr_0.11.0.tar.gz
...
Status: OK

94.37user 9.24system 2:52.30elapsed 60%CPU (0avgtext+0avgdata 135824maxresident)k
19768inputs+21888outputs (38major+1897963minor)pagefaults 0swaps

Without:

$ R_CHECK_ENVIRON=Renviron.bioc command time R --vanilla CMD check progressr_0.11.0.tar.gz
...
Status: OK

94.58user 9.57system 2:56.51elapsed 59%CPU (0avgtext+0avgdata 136000maxresident)k
0inputs+21960outputs (0major+1898301minor)pagefaults 0swaps
@hpages
Copy link
Contributor

hpages commented Dec 1, 2022

Thanks @HenrikBengtsson. The fact that this feature is implemented via symlinks makes a lot of sense. I wonder if this works on Windows where symlinks used to be problematic. Even though I noticed that with recent versions of Windows the situation seemed to have improved significantly. Maybe R core decided to make this feature optional because of portability concerns?

Anyways I think we should try to add this to the 3.17 builds @jwokaty. This will introduce a lot of new failures so we should announce on the bioc-devel mailing list! Also we'll need to keep a close eye on what happens on Windows and Mac. Thanks!

@HenrikBengtsson
Copy link
Author

It looks like the win-builder service detects _R_CHECK_SUGGESTS_ONLY_ bugs, e.g.

* checking examples ... [1s] ERROR
Running examples in 'teeny-Ex.R' failed
The error most likely occurred in:
...
> ## This gives an error if _R_CHECK_DEPENDS_ONLY_EXAMPLES_=true
> library(listenv)
Error in library(listenv) : there is no package called 'listenv'
Execution halted

...

* checking tests ... [2s] ERROR
  Running 'envs.R' [0s]
  Running 'installed.packages.R' [1s]
  Running 'libPaths.R' [0s]
  Running 'non-declared-packages.R' [0s]
Running the tests in 'tests/non-declared-packages.R' failed.
Complete output:
  > ## This gives an error if _R_CHECK_SUGGESTS_ONLY_=true
  > library(listenv)
  Error in library(listenv) : there is no package called 'listenv'
  Execution halted

...

* checking re-building of vignette outputs ... [2s] ERROR
Error(s) in re-building vignettes:
--- re-building 'debug.Rmd' using rmarkdown
Quitting from lines 73-74 (debug.Rmd) 
Error: processing vignette 'debug.Rmd' failed with diagnostics:
there is no package called 'listenv'
--- failed re-building 'debug.Rmd'

It also works on MS Windows on MS Windows running on GitHub Actions, e.g. https://github.com/HenrikBengtsson/teeny/actions/runs/3595702927/jobs/6055514686

You could start by enabling this on Linux first, then macOS, and lastly MS Windows.

@hpages
Copy link
Contributor

hpages commented Dec 1, 2022

This looks promising, thanks. The way we deploy Renviron.bioc on the build machines is not platform-specific at the moment so _R_CHECK_SUGGESTS_ONLY_=true will get defined on all the builders. But maybe there's a way to do things like "if platform is Windows then define some variable" in the file itself?

Anyways, we'll cross that bridge when we get there. For now we're going to add this to the devel builds only so not a big deal. Isn't that what devel is about? 😉 We can always reverse if things go wrong.

@vjcitn
Copy link
Contributor

vjcitn commented Dec 1, 2022

I can't reproduce the tradeSeq example.
Using

-rw-rw-r--  1 stvjc stvjc 5336380 Nov  2 11:46 tradeSeq_1.13.0.tar.gz

I ran

_R_CHECK_SUGGESTS_ONLY_=true R_CHECK_ENVIRON=~/Renviron.bioc R --vanilla CMD check tradeSeq_1.13.0.tar.gz

and got ...

 NONE
* checking re-building of vignette outputs ... OK
* checking PDF version of manual ... OK
* DONE

Status: 3 NOTEs
See
  ‘/home/stvjc/BIOC_SOURCES/tradeSeq.Rcheck/00check.log’
for details.

Session Info:

> sessionInfo()
R Under development (unstable) (2022-12-01 r83394)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.5 LTS

Matrix products: default
BLAS:   /home/stvjc/R-43dev-dist/lib/R/lib/libRblas.so
LAPACK: /home/stvjc/R-43dev-dist/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: America/New_York
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rmarkdown_2.18.1

loaded via a namespace (and not attached):
 [1] compiler_4.3.0  startup_0.19.0  fastmap_1.1.0   cli_3.4.1      
 [5] tools_4.3.0     htmltools_0.5.3 knitr_1.41      xfun_0.35      
 [9] digest_0.6.30   rlang_1.0.6     evaluate_0.18  

@HenrikBengtsson
Copy link
Author

I can't reproduce the tradeSeq example.

Could it be that you've installed non-base R packages to the system R library? Which package does:

$ Rscript --vanilla -e "dir(rev(.libPaths())[1])"

list?

@vjcitn
Copy link
Contributor

vjcitn commented Dec 1, 2022

Indeed I have. Everything goes into the system library on my laptop. So the sandbox has no fence?

@vjcitn
Copy link
Contributor

vjcitn commented Dec 1, 2022

The way I had envisioned the sandbox, one would put in the sandbox links to the "base" packages and the packages in the fully traversed dependency graph of all declared dependencies of the package being tested. The check would occur against this sandbox library. But if "base" means "all packages in system library" this will not achieve the aim.

@vjcitn
Copy link
Contributor

vjcitn commented Dec 1, 2022

If the package fails to declare anything needed outside of

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_4.3.0

it should fail.

@HenrikBengtsson
Copy link
Author

Indeed I have. Everything goes into the system library on my laptop. So the sandbox has no fence?

The sandbox runs with the system R library + a temporary R library holding all the declared package dependencies.

The only way to support your use case is for R to know which the base-R packages are and figure out to hide anything else. The only way I can imagine that to work is for R to not include the system R library in these tests, but only the temporary R library. I'm not sure if anyone attempted to do that, but I think it'll be hard, because many of the base-R packages are already loaded when R CMD check runs.

FWIW, for this and other reasons, I argue that R should prevent installing anything into the system R library after R has been installed. Nothing in the R installation folder should be modified after R has been installation. The CRAN-provided R installation for macOS has this problem, where it sets write permissions on the R system library folder, causing install.packages() to install there. This problem has been brought up at https://groups.google.com/g/r-sig-mac/c/KUmNopUuNGI?pli=1 (I think it's a bad design, because it gives macOS users a different experience than other users, e.g. they don't get full checks.)

I argue that not even the "recommended" packages (e.g. codetools and Matrix) should be installed there; they could go into R_LIBS_SITE, but I think it's better to always have them end up in R_LIBS_USER, so that they're treated just like any other R package. Makes it also easier and less confusing when non-privileged users run update.packages().

@HenrikBengtsson
Copy link
Author

If the package fails to declare anything needed outside of

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_4.3.0

it should fail.

There are more base-R packages than those;

$ Rscript --vanilla -e "dir(rev(.libPaths())[1])"
 [1] "base"         "compiler"     "datasets"     "graphics"     "grDevices"   
 [6] "grid"         "methods"      "parallel"     "splines"      "stats"       
[11] "stats4"       "tcltk"        "tools"        "translations" "utils"  

Those packages come with all R installations and are never updated. They're only updated when R is updated. That's a property unique to base-R packages.

@vjcitn
Copy link
Contributor

vjcitn commented Dec 2, 2022

I see your point and I agree that there is value to the typical separation of "system" and "user" packages. When I run your command above I get 980 packages ... and there should be more.... It would be nice to get a clear convention on this.

@HenrikBengtsson
Copy link
Author

HenrikBengtsson commented Dec 2, 2022

I think that's a discussion for the R-devel list, but modulo some outliers, the de-facto standard is to have only base-R and "recommended" packages in the system R library. That's what most R installations install out of the box.

@HenrikBengtsson
Copy link
Author

BTW, as soon as I've installed R from source on Linux as myself, say R v4.2.2, I run chmod -R ugo-w /path/to/R-4.2.2/ to prevent messing with the installation by mistake. That forces install.packages() to install into R_LIBS_USER. If the latter does not exist, then R is designed to ask if it can create such a folder, and then it'll install CRAN and Bioconductor package there instead (just as if R was installed by admin/root in the first place).

@vjcitn
Copy link
Contributor

vjcitn commented Dec 2, 2022

By setting the mode of the lib/R/library folder to 555, I should be able to reproduce the tradeSeq finding on a new installation. Does the BBS do this?

@HenrikBengtsson
Copy link
Author

By setting the mode of the lib/R/library folder to 555 ...

... and move all packages except:

> tools:::.get_standard_package_names()
$base
 [1] "base"      "tools"     "utils"     "grDevices" "graphics"  "stats"    
 [7] "datasets"  "methods"   "grid"      "splines"   "stats4"    "tcltk"    
[13] "compiler"  "parallel" 

$recommended
 [1] "MASS"       "lattice"    "Matrix"     "nlme"       "survival"  
 [6] "boot"       "cluster"    "codetools"  "foreign"    "KernSmooth"
[11] "rpart"      "class"      "nnet"       "spatial"    "mgcv"

to your R_LIBS_USER folder.

@HenrikBengtsson
Copy link
Author

HenrikBengtsson commented Dec 2, 2022

Oh, I think tools:::.get_standard_package_names() is missing translations (looks like a bug in R).

UPDATE: translations is not actually a real package, although it has a DESCRIPTION file and packageDescription("translations") works. It's a folder holding translation messages.

@hpages
Copy link
Contributor

hpages commented Dec 2, 2022

The sandbox runs with the system R library + a temporary R library holding all the declared package dependencies.

Wait.. what? That's really unfortunate. So this means that _R_CHECK_SUGGESTS_ONLY_=true won't work on the builders either where everything is installed in the default library folder. Kind of disappointing that it only works with a very specific setup 😞 And before we start tweaking the setup of the package libraries on the build machines in a way that will please _R_CHECK_SUGGESTS_ONLY_=true, it would be nice to know where _R_CHECK_SUGGESTS_ONLY_ is officially documented so we can actually learn how to make it work.

@HenrikBengtsson
Copy link
Author

Most _R_CHECK_nnn_ variables are explained in Section '8 Tools' of 'R Internals' (https://cran.r-project.org/doc/manuals/r-release/R-ints.html):

The following variables control checks for undeclared/unconditional use of other packages. They work by setting up a temporary library directory and setting .libPaths() to just that and .Library, so are only effective if additional packages are installed somewhere other than .Library. The temporary library is populated by symbolic links to installed packages not also in .Library.

...

_R_CHECK_SUGGESTS_ONLY_:

If set to a true value, running examples, tests and vignettes is done with a temporary library directory populated by all the Depends/Imports/Suggests packages. (As exceptions, packages in a ‘`VignetteBuilder`’ field are always made available.) Default: false (but true for CRAN submission checks: some of the regular checks use true and some use false).

Note, .Library is what I refer to as the system R library, i.e. the above suggests that you should not install "additional packages" there.

There is also Section '1.1.3.1 Suggested packages' of 'Writing R Extensions' (https://cran.r-project.org/doc/manuals/R-exts.html) says:

On most systems, R CMD check can be run with only those packages declared in ‘Depends’ and ‘Imports’ by setting environment variable _R_CHECK_DEPENDS_ONLY_=true, whereas setting _R_CHECK_SUGGESTS_ONLY_=true also allows suggested packages, but not those in ‘Enhances’ nor those not mentioned in the DESCRIPTION file. It is recommended that a package is checked with each of these set, as well as with neither.

To me, it comes down to whether or not you want to find these missing package declarations or not(*). If so, the current design of R requires that you keep .Library free from non-base R packages. For discussions on this design, I suggest bringing it up on the R-devel mailing list.

(*) If you choose not, you'll have packages that won't be fully installed in air-gapped compute environments. Also, those packages cannot be fully tested in reverse-dependency checks, which means they won't benefit from the revdep checks and vice versa. CRAN packages don't have these problems, since CRAN has had these types of checks in place for years.

(I struggle and spend lots of extra time with the revdep failures every time there are Bioconductor packages involved - unless I choose not to care about Bioconductor packages.)

@HenrikBengtsson
Copy link
Author

BTW, if you install everything to the system R library, how do you keep separate Bioconductor release and devel libraries during half of the year when they both share the same R version/installation?

@vjcitn
Copy link
Contributor

vjcitn commented Dec 2, 2022

This last question is really something each user/developer has to address separately, unfortunately. In my case, I wind up with variously named versions of R devoted to one or another bioc stream. As for the build system, since release and devel are on different hardware the question doesn't apply, I think.

I'd like to think that the task of identifying undeclared dependencies could be solved without addressing this concern. The fact that the behavior of R CMD check varies from installation to installation is driven by the fact that R itself has installation-dependent behaviors and uses. The dependency analysis doesn't have to come out of the R CMD check process, though it would be nice if it did. Another approach may be needed.

@HenrikBengtsson
Copy link
Author

HenrikBengtsson commented Dec 2, 2022

The dependency analysis doesn't have to come out of the R CMD check process, though it would be nice if it did. Another approach may be needed.

Note that these type of missing package dependencies can only be detected at run-time. Static-code analysis will not be able to detect those, unless it fully emulates the R engine. Thus, you need a way that only tests with the base-R packages without any additional packages installed beyond the initial R installation.

As a refresher, _R_CHECK_SUGGESTS_ONLY=true R CMD check solves this by basically checking with:

$ R_LIBS="$(mktemp -d)" R_LIBS_SITE=NULL R_LIBS_USER=NULL Rscript --vanilla -e ".libPaths()"
[1] "/tmp/hb/tmp.nQBwsilHpr"                                 
[2] "/path/to/R-4.2.2/lib/R/library"

Importantly, it is not possible to remove or replace .Library ("system R library"). It is fixed and always there at the end of the library search path no matter how hard you try, e.g.

$ R_LIBS=NULL R_LIBS_SITE=NULL R_LIBS_USER=NULL Rscript --vanilla -e ".libPaths()"
[1] "/path/to/R-4.2.2/lib/R/library"

$ R_LIBS=NULL R_LIBS_SITE=NULL R_LIBS_USER=NULL Rscript --vanilla -e ".Library"
[1] "/path/to/R-4.2.2/lib/R/library"

So, installing other packages to .Library will make it impossible to test without them. Changing that, or get _R_CHECK_SUGGESTS_ONLY_ et al. to work also in that case, will most likely be a major change to R, e.g. internal flags making non-base R packages "invisible" to the package loading functions, while still making them available to the sandboxed library. I doubt that such a re-implementation will happen any time soon. My guess is that it would be easier to convince R Core to lock down .Library such that only base-R packages can be installed there. R is already prepared for that, and I argue it's almost assumed that most R installation work that way. We have R_LIBS, R_LIBS_SITE, and R_LIBS_USER libraries to fall back on. And if that wouldn't be enough, each of those can specify multiple library folders (colon separated on Unix, semicolon on MS Windows). R is designed to handle this, so not a big surprise for end-users either. In a fresh R setup, without any priorly installed R packages, we get:

$ R --quiet --vanilla
> chooseCRANmirror(ind = 1)
> install.packages("listenv")
Warning in install.packages("listenv") :
  'lib = "/path/to/R-4.2.2/lib/R/library"' is not writable
Would you like to use a personal library instead? (yes/No/cancel) yes
Would you like to create a personal library~/R/x86_64-pc-linux-gnu-library/4.2to install packages into? (yes/No/cancel) yes
trying URL 'https://cloud.r-project.org/src/contrib/listenv_0.8.0.tar.gz'
...
* DONE (listenv)

> find.package("listenv")
[1] "/home/hb/R/x86_64-pc-linux-gnu-library/4.2/listenv"

After this, all R packages will be installed to the above R_LIBS_USER folder.

The fact that the behavior of R CMD check varies from installation to installation is driven by the fact that R itself has installation-dependent behaviors and uses.

I agree this is unfortunate, and it's even more unfortunately that Simon's macOS CRAN binary makes .Library writable, because it means lots of macOS users are unaware of this problem. Otherwise, it's only people who install from source R as themselves that end up with a writable .Library folder. I would love to have R lock down .Library after installation. BTW, malicious, or faulty, R code can easily break your R base packages, and thereby R too, if .Library is writable.

That said, all the tools are there to detect missing dependencies. I don't see a good reason for Bioconductor servers having to install other packages to .Library (if that's the current setup - not yet confirmed). All it takes is to lock down .Library is:

## Make .Library read-only
chmod -R ugo-w "$(Rscript --vanilla -e "cat(.Library)")"

## Install other packages in a library that is specific
## to the current platform, version of R, and version of Bioconductor
BIOC_VERSION=3.17
R_LIBS_USER=~/R/%p-library/%v-bioc_${BIOC_VERSION}
mkdir -p "${R_LIBS_USER}"
export R_LIBS_USER

That also have the benefit that you can use the same R installation for different Bioconductor versions. All R packages will from now on be installed under:

Rscript --vanilla -e ".libPaths()[1]"
[1] "x86_64-pc-linux-gnu-library/4.2-bioc_3.17"

Since R_LIBS_USER was precreated, there will be no approval prompt (as in the above example).

@hpages
Copy link
Contributor

hpages commented Dec 2, 2022

I don't see a good reason for Bioconductor servers having to install other packages to .Library (if that's the current setup - not yet confirmed).

I confirm. Sure, we could change that. However that's the default setup when you install R from source as a regular user on Linux (i.e. if you skip the make install step), which is what we do on the build machines, what I do on my laptop, and I suspect what many many developers do.

So even if we decided to change the setup on the build machines, users/developpers won't have it so won't be able to reproduce what they see on the build report. I really like the idea of identifying undeclared dependency, but not at any cost. With the _R_CHECK_SUGGESTS_ONLY_=true, the cost is:

  • significant changes to the setup of the build machines
  • probably some adjustments to the BBS code
  • not reproducible (unless developers organize their package libraries in a way that is compatible with R_CHECK_SUGGESTS_ONLY=true)

So after some internal discussion, we're kind of hesitating to do this. We'll need a little bit more time to think about it.

Thanks for your input on this. It's appreciated and we've learned a lot.

@HenrikBengtsson
Copy link
Author

if you skip the make install step), which is what we do on the build machines, what I do on my laptop, and I suspect what many many developers do.

Why would you do that? It's a step that only takes 10-20 seconds to complete. Aren't you worried that you're running with non-intended things in the R installation?

So even if we decided to change the setup on the build machines, users/developpers won't have it so won't be able to reproduce what they see on the build report.

The way I see it, the current setup is making life easier for package developers (because they don't have to fix these bugs because they are not reported in the first place). The cost for this is an increased burden on end-users who have to deal with incomplete packages, because of incomplete testing.

significant changes on the build machine setups

Hmm, did you see my four line code snippet how to do this?

not reproducible (unless developers organize their package libraries in a way that is compatible with R_CHECK_SUGGESTS_ONLY=true)

Huh, with a clean .Library, one cannot reproduce the current Bioconductor result. On the contrary, in order to reproduce the Bioconductor server results, a developer has to install all 22,000+ CRAN and Bioconductor packages in .Library. Plus, making sure they are all the same version as on Bioconductor. Who does that? And how do you do that on CI services such as Travis CI and GitHub Actions?

I'm honestly surprised I have to argue this much for higher-coverage testing that is already available and implemented.

Just to be clear, all Bioconductor package with _R_CHECK_SUGGESTS_ONLY_=true mistakes are missing out on all the revdep checks that the bigger packages on CRAN do, including tidyverse package, etc. This means that Bioconductor is more likely to all of a sudden suffer from breaking changes in their dependencies, which are otherwise can be detected are reported to the maintainer, who then can prepare for the change before it hits then end-users. This is because these package cannot be fully tested if these package lack some dependencies in examples or package tests. Thus far, I've taken the extra step to manually inspect such broken Bioconductor packages, and I see them all the time when I check matrixStats and other packages of mine, and I try to report upstream to Bioconductor package maintainers. I don't have resources to keep doing that manual work, so I will stop doing that.

@hpages
Copy link
Contributor

hpages commented Dec 3, 2022

Aren't you worried that you're running with non-intended things in the R installation?

We do out-of-tree compilation so we end up with a clean installation, as documented here: https://github.com/Bioconductor/BBS/blob/master/Doc/Prepare-Ubuntu-20.04-HOWTO.md#configure-and-compile-r This is a very convenient way to manage different R installations under a non-sudoer account on a Linux machine. No need to sudo make install, which wouldn't work anyways on our builders since we install R and run the builds from a non-sudoer account for security reasons.

Anyways, it doesn't matter so much what we do on the build machines. As I said, we could change that. The issue is that this is a very common way to install R among developers, and that produces a setup that is not compatible with _R_CHECK_SUGGESTS_ONLY_=true.

We also install Simon's binaries on Mac, like 99% of Mac users, and that also produces a setup that is not compatible with _R_CHECK_SUGGESTS_ONLY_=true.

I'm honestly surprised I have to argue this much for higher-coverage testing that is already available and implemented.

Higher-coverage testing is good. But in the case of _R_CHECK_SUGGESTS_ONLY_=true it's not that it's already available and implemented and ready to be used out-of-the-box. Now I understand why this is an optional feature and not built in R CMD check. No need to argue this much: as I said we're hesitating to do this and want to take the time to carefully evaluate such change. We've run the builds for the last 17 years without enabling this feature so I hope it's ok if this takes a few days or weeks.

@HenrikBengtsson
Copy link
Author

No need to sudo make install, which wouldn't work anyways on our builders since we install R and run the builds from a non-sudoer account for security reasons.

With configure --prefix=/path/to/non-root-folder/R-4.2.2-yada, there's no need for sudo; make install will install to the prefix folder.

I said we're hesitating to do this and want to take the time to carefully evaluate such change. We've run the builds for the last 17 years without enabling this feature so I hope it's ok if this takes a few days or weeks. ...

Ok, sorry, I interpreted it as this was a done deal.

@hpages
Copy link
Contributor

hpages commented Mar 24, 2023

We've made some progress on this and will announce something soon. Developers might want to come here to provide some feedback so I'm re-opening the issue.

@hpages hpages reopened this Mar 24, 2023
@HenrikBengtsson
Copy link
Author

Thank you sooo much for doing this! I'm confident that this will increase the quality of several Bioconductor packages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants